In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!
The pickled data is a dictionary with 4 key/value pairs:
# Load pickled data
import pickle
# TODO: fill this in based on where you saved the training and testing data
training_file = 'traffic-signs-data/train.p'
testing_file = 'traffic-signs-data/test.p'
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
import numpy as np
print(X_train.shape)
print(X_test.shape)
unique_items, counts = np.unique(y_train, return_counts=True)
print(len(unique_items))
### To start off let's do a basic data summary.
# TODO: number of training examples
n_train = X_train.shape[0]
# TODO: number of testing examples
n_test = X_test.shape[0]
# TODO: what's the shape of an image?
image_shape = X_train.shape[1:]
# TODO: how many classes are in the dataset
n_classes = len(unique_items)
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
y_list = list(y_test)
total = len(y_list)
fig, axarr = plt.subplots(9,5, figsize=(8, 15))
i, j = 0, 0
for label in set(y_list):
img = X_test[y_list.index(label)]
freq = 100. * y_list.count(label) / total
axarr[i, j].imshow(img)
axarr[i, j].axis("off")
axarr[i, j].set_title("Class #%d (%.2f%%)" % (label, freq), fontsize=8)
j += 1
if j > 4:
j = 0
i += 1
axarr[8, 3].axis("off")
axarr[8, 4].axis("off")
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Preprocess the data here.
### Feel free to use as many code cells as needed.
def normalize(image_data):
"""
Normalize the image data with Min-Max scaling to a range of [0.1, 0.9]
:param image_data: The image data to be normalized
:return: Normalized image data
"""
# TODO: Implement Min-Max scaling for grayscale image data
Xmin = 0
Xmax = 255
a = 0.1
b = 0.9
return a + ((image_data-Xmin) * (b-a) / (Xmax - Xmin))
X_train = normalize(X_train)
X_test = normalize(X_test)
from sklearn.preprocessing import LabelBinarizer
# Turn labels into numbers and apply One-Hot Encoding
encoder = LabelBinarizer()
encoder.fit(y_train)
y_train = encoder.transform(y_train)
y_test = encoder.transform(y_test)
# Change to float32, so it can be multiplied against the features in TensorFlow, which are float32
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)
print('Labels One-Hot Encoded')
Describe the techniques used to preprocess the data.
Answer: Same to the MNIST dataset, image data is first normalized. Data label is then encoded using One-Hot Encoding. The normalizing technique scale the image data with Min-Max to a range of [0.1, 0.9]
Numerical condition is important in the study of neural networks because ill-conditioning is a common cause of slow and inaccurate results. For good conditioned data, the negative gradient always points straight at the minimum of the error surface. Hence, training algorithms that take steps in the direction of the negative gradient are likely to work well.
As for Labels, we are using one-hot encoding. We are using softmax function to transfer classification scores to probability. The probability of the correct class will be close to 1, others will be close to zero. Each label will have a vector where the correct class has a value of 1 and all others are 0.
### Generate data additional (if you want to!)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
from sklearn.model_selection import train_test_split
# Get randomized datasets for training and validation
train_features, valid_features, train_labels, valid_labels = train_test_split(
X_train,
y_train,
test_size=0.15,
random_state=832289)
print('Training features and labels randomized and split.')
import pickle
import os
pickle_file = 'traffic.pickle'
if not os.path.isfile(pickle_file):
print('Saving data to pickle file...')
try:
with open('traffic.pickle', 'wb') as pfile:
pickle.dump(
{
'train_dataset': train_features,
'train_labels': train_labels,
'valid_dataset': valid_features,
'valid_labels': valid_labels,
'test_dataset': X_test,
'test_labels': y_test,
},
pfile, pickle.HIGHEST_PROTOCOL)
except Exception as e:
print('Unable to save data to', pickle_file, ':', e)
raise
print('Data cached in pickle file.')
Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?
Answer: 85% of training data is used to train our classifier. 15% of training data is used to validate the classifier. Testing data is not touched, read in from 'traffic-signs-data/test.p'. Validation data is generated additionally to evaluate our trained classifier. By doing this, our classifier will not learn anything about testing data. Therefore, we could see whether our algorithm is general enough when it does prediction on testing data.
%matplotlib inline
# Load the modules
import pickle
import math
import numpy as np
import tensorflow as tf
from tqdm import tqdm
import matplotlib.pyplot as plt
# Reload the data
pickle_file = 'traffic.pickle'
with open(pickle_file, 'rb') as f:
pickle_data = pickle.load(f)
train_dataset = pickle_data['train_dataset']
train_labels = pickle_data['train_labels']
valid_dataset = pickle_data['valid_dataset']
valid_labels = pickle_data['valid_labels']
test_dataset = pickle_data['test_dataset']
test_labels = pickle_data['test_labels']
del pickle_data # Free up memory
print('Data and modules loaded.')
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
epochs = 17
batch_size = 128
patch_size = 5
image_size = 32
num_channels = 3
depth = 32
num_hidden = 100
num_labels = 43
graph = tf.Graph()
def inference(data):
# Variables.
layer1_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, num_channels, depth], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([depth]))
layer2_weights = tf.Variable(tf.truncated_normal(
[patch_size, patch_size, depth, depth], stddev=0.1))
layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
layer3_weights = tf.Variable(tf.truncated_normal(
[image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.truncated_normal(
[num_hidden, num_labels], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer1_biases)
conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
hidden = tf.nn.relu(conv + layer2_biases)
shape = hidden.get_shape().as_list()
reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
return tf.matmul(hidden, layer4_weights) + layer4_biases
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer: Model architecture: Input layer -> Conv layer+RELU(depth=32) -> Conv layer+RELU(depth=32) -> FC(depth=100) -> FC(depth=43)
### Train your model here.
### Feel free to use as many code cells as needed.
new_images= np.zeros((1, 32, 32, 3))
new_labels= np.zeros((1,43))
with graph.as_default():
# Input data.
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
tf_valid_labels = tf.constant(valid_labels)
tf_test_labels = tf.constant(test_labels)
tf_new_dataset = tf.constant(new_images)
tf_new_labels = tf.constant(new_labels)
# Training computation.
logits = inference(tf_train_dataset)
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
# Optimizer.
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(loss_op)
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(tf_train_labels,1))
accuracy_op = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
tf_valid_dataset = tf.cast(tf_valid_dataset, tf.float32)
tf_test_dataset = tf.cast(tf_test_dataset, tf.float32)
tf_new_dataset = tf.cast(tf_new_dataset, tf.float32)
tf_valid_labels = tf.cast(tf_valid_labels, tf.float32)
tf_test_labels = tf.cast(tf_test_labels, tf.float32)
logits_valid = inference(tf_valid_dataset)
logits_test = inference(tf_test_dataset)
logits_new = inference(tf_new_dataset)
valid_prediction = tf.nn.softmax(logits_valid)
test_prediction = tf.nn.softmax(logits_test)
new_prediction = tf.nn.softmax(logits_new)
def eval_data(dataset,labels):
"""
Given a dataset as input returns the loss and accuracy.
"""
shape = dataset.shape
steps_per_epoch = shape[0] // batch_size
num_examples = steps_per_epoch * batch_size
total_acc, total_loss = 0, 0
for step in range(steps_per_epoch):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = dataset[offset:(offset + batch_size), :, :, :]
batch_labels = labels[offset:(offset + batch_size), :]
loss, acc = sess.run([loss_op, accuracy_op], feed_dict={tf_train_dataset: batch_data, tf_train_labels: batch_labels})
total_acc += (acc * batch_data.shape[0])
total_loss += (loss * batch_data.shape[0])
return total_loss/num_examples, total_acc/num_examples
with tf.Session(graph=graph) as sess:
#saver = tf.train.Saver()
sess.run(tf.initialize_all_variables())
print('Initialized')
steps_per_epoch = train_dataset.shape[0] // batch_size
num_examples = steps_per_epoch * batch_size
# Train model
for i in range(epochs):
for step in range(steps_per_epoch):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
batch_labels = train_labels[offset:(offset + batch_size), :]
loss = sess.run(train_op, feed_dict={tf_train_dataset: batch_data, tf_train_labels: batch_labels})
val_loss, val_acc = eval_data(valid_dataset, valid_labels)
print("EPOCH {} ...".format(i+1))
print("Validation loss = {}".format(val_loss))
print("Validation accuracy = {}".format(val_acc))
# Evaluate on the test data
test_loss, test_acc = eval_data(test_dataset, test_labels)
print("Test loss = {}".format(test_loss))
print("Test accuracy = {}".format(test_acc))
#saver.save(sess, "traffic_sign_cnn_clf.ckpt")
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer: optimizer is gradient descend optimizer with a learning rate of 0.05.
batch size is 128.
When epochs is around 17, validation loss reaches minimum, stop to prevent over-fitting.
some parameters like depth of each layer, learning rate was tuned to get the best test accuracy.
What approach did you take in coming up with a solution to this problem?
Answer: Image shows relatively constraint viariablities in appearance. Challenges are due to real-world viariances, such as view-point, lighting reflections, resolutions, physical damages, grafiti etc. Therefore, this problem should be simillar with the MNIST problem. However, the number of lables in this problem is bigger. To approach the solution, I am taking the CNN model from Udacity's Deep learning course and looking for optimum CNN layer depth numbers.
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Load the images and plot them here.
### Feel free to use as many code cells as needed.
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
image_files = os.listdir("internet/")
for image_file in image_files:
img=mpimg.imread("internet/"+image_file)
imgplot = plt.imshow(img)
plt.show()
import cv2
import numpy as np
def normalize(image_data):
"""
Normalize the image data with Min-Max scaling to a range of [0.1, 0.9]
:param image_data: The image data to be normalized
:return: Normalized image data
"""
# TODO: Implement Min-Max scaling for grayscale image data
Xmin = 0
Xmax = 255
a = 0.1
b = 0.9
return a + ((image_data-Xmin) * (b-a) / (Xmax - Xmin))
new_images = []
# Show each image
for image in image_files:
image = './internet/' + image
img = cv2.imread(image)
img = cv2.resize(img, (32,32))
plt.imshow(img)
plt.show()
img = normalize(img)
# Append to the test_images array
new_images.append(img)
# convert to numpy array
new_images = np.asarray(new_images, dtype=np.float32)
#print(type(new_images))
print(new_images.shape)
new_labels = np.array([14,18,40,1,4,2,31,34,25,31])
print(new_labels.shape)
from sklearn.preprocessing import LabelBinarizer
encoder = LabelBinarizer()
encoder.fit(train_labels)
new_labels = encoder.transform(new_labels)
new_labels = new_labels.astype(np.float32)
print(new_labels.shape)
Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It would be helpful to plot the images in the notebook.
Answer: I chose 10 images. Some challenges could be shape of the sign, the background color of the image, or the angel of the image taken, or the distance from which the image was taken. I put in some easy images so the test accuracy should not be 0.0.
#repeat 128 times so data becomes bigger then batch size
new_images=np.repeat(new_images,128, axis=0)
new_labels=np.repeat(new_labels,128, axis=0)
from sklearn.utils import shuffle
new_images, new_labels = shuffle(new_images, new_labels)
print(new_images.shape)
print(new_labels.shape)
with tf.Session(graph=graph) as sess:
saver = tf.train.Saver()
sess.run(tf.initialize_all_variables())
print('Initialized')
steps_per_epoch = train_dataset.shape[0] // batch_size
num_examples = steps_per_epoch * batch_size
# Train model
for i in range(epochs):
for step in range(steps_per_epoch):
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
batch_labels = train_labels[offset:(offset + batch_size), :]
loss = sess.run(train_op, feed_dict={tf_train_dataset: batch_data, tf_train_labels: batch_labels})
val_loss, val_acc = eval_data(valid_dataset, valid_labels)
print("EPOCH {} ...".format(i+1))
print("Validation loss = {}".format(val_loss))
print("Validation accuracy = {}".format(val_acc))
# Evaluate on the test data
test_loss, test_acc = eval_data(new_images, new_labels)
value, indices = tf.nn.top_k(new_prediction, k=10)
value_test, indices_test = tf.nn.top_k(test_prediction, k=10)
print("top 10 value from internet images= {}".format(value.eval()))
print("top 10 value indices from internet images= {}".format(indices.eval()))
print("top 10 value from test dataset= {}".format(value_test.eval()))
print("top 10 value indices from test dataset= {}".format(indices_test.eval()))
print("Test loss = {}".format(test_loss))
print("Test accuracy = {}".format(test_acc))
Is your model able to perform equally well on captured pictures when compared to testing on the dataset?
Answer: The model does not perform equally well on captured pictures. However, if the internet testing data is more simillar to the dataset, the performance will be better. The model could figure out triangle wild animal, triangle curve to left/right signs, as it was trained that way. When it comes to a sign shape being square, the model could not pick it up.
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
Answer:
top 10 value from internet images= [[ 0.40967494 0.17912006 0.13142563 0.08877142 0.04981499 0.04916997 0.02045132 0.01644639 0.01167106 0.00641837]] top 10 value indices from internet images= [[32 21 16 23 25 12 28 22 41 27]] top 10 value from test dataset= [[ 0.72918373 0.13641556 0.04792577 ..., 0.00430589 0.00263355 0.00163265] [ 0.44430354 0.26832485 0.17347392 ..., 0.00665833 0.00274228 0.00192503] [ 0.56412899 0.18347855 0.11751836 ..., 0.00646097 0.00517882 0.00245577] ..., [ 0.56187278 0.12470841 0.10998023 ..., 0.00609347 0.00533893 0.00505749] [ 0.53022188 0.14578231 0.1039624 ..., 0.00499013 0.00481291 0.00476824] [ 0.51130807 0.12344549 0.11752731 ..., 0.00636673 0.00531728 0.00463831]] top 10 value indices from test dataset= [[40 29 13 ..., 10 42 32] [29 40 22 ..., 10 42 12] [40 29 22 ..., 32 42 14] ..., [40 22 29 ..., 12 34 32] [40 29 22 ..., 34 14 12] [40 22 29 ..., 12 34 32]]
Top ten predictions from the internet images are 32 21 16 23 25 12 28 22 41 27
32,End of all speed and passing limits 21,Double curve 16,Vechiles over 3.5 metric tons prohibited 23,Slippery road 25,Road work 12,Priority road 28,Children crossing 22,Bumpy road 41,End of no passing 27,Pedestrians
Actual label is [[14],[18],[40],[1],[4],[2],[31],[34],[25],[31]] 25 make it to the model's prediction top ten.
Looking the top10 value and indices from test dataset, it seems like label 40 & 29 sometimes show close values, such as 0.44430354 0.26832485. It is possible that the model has not yet picked up the subtle differences between those two labeled images.
The following code is to show images that the model predicted to be from test dataset.
# Load pickled data
import pickle
testing_file = 'traffic-signs-data/test.p'
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_test, y_test = test['features'], test['labels']
predicted_labels = np.array([32, 21, 16, 23, 25, 12, 28, 22, 41, 27])
internet_list = list(predicted_labels)
y_list = list(y_test)
total = len(y_list)
fig, axarr = plt.subplots(4,3, figsize=(8, 15))
i, j = 0, 0
for label in set(internet_list):
img = X_test[y_list.index(label)]
freq = 100. * y_list.count(label) / total
axarr[i, j].imshow(img)
axarr[i, j].axis("off")
axarr[i, j].set_title("Class #%d (%.2f%%)" % (label, freq), fontsize=8)
j += 1
if j > 2:
j = 0
i += 1
axarr[3, 0].axis("off")
axarr[3, 1].axis("off")
axarr[3, 2].axis("off")
The following code is to show the images from the internet.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
new_images = []
for image in image_files:
image = './internet/' + image
img = plt.imread(image)
img = cv2.resize(img, (32,32))
# Append to the test_images array
new_images.append(img)
internet_labels = np.array([14,18,40,1,4,2,31,34,25,31])
y_list = list(internet_labels)
total = len(y_list)
fig, axarr = plt.subplots(4,3, figsize=(8, 15))
i, j = 0, 0
for label in set(y_list):
img = new_images[y_list.index(label)]
freq = 100. * y_list.count(label) / total
axarr[i, j].imshow(img)
axarr[i, j].axis("off")
axarr[i, j].set_title("Class #%d (%.2f%%)" % (label, freq), fontsize=8)
j += 1
if j > 2:
j = 0
i += 1
axarr[3, 0].axis("off")
axarr[3, 1].axis("off")
axarr[3, 2].axis("off")
If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images.
Answer:
code is provided concerning loading new images.
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.